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Abstract 

For each probability distribution on a countable alphabet, a sequence of positive 
functionals are developed as tail indices based on Turing’s perspective. By and only 
by the asymptotic behavior of these indices, domains of attraction for all probability 
distributions on the alphabet are defined. The three main domains of attraction are 
shown to contain distributions with thick tails, thin tails and no tails respectively, 
resembling in parallel the three main domains of attraction, Gumbel, Frechet and 
Weibull families, for continuous random variables on the real line. In addition to 
the probabilistic merits associated with the domains, the tail indices are partially 
motivated by the fact that there exists an unbiased estimator for every index in 
the sequence, which is therefore statistically observable, provided that the sample is 
sufficiently large. 


1 Introduction and Summary. 

Consider an alphabet with conntably many letters = {l],\ k > 1\ and an associated 
probability distribntion P = k > 1} ^ ^ where is the class of all probability 
distribntions on ^. Let xi, • • • ,x„ be an independently and identically distribnted {iid) 
random sample from ^ nnder P. Let [yk] > 1} and {pk = Vk/n] A; > 1} be the observed 
letter freqnencies and relative letter freqnencies in the sample. 

Before proceeding fnrther, let us first give a little thought to possible notions of an 
“extreme value” and a “tail” of a distribution in the current setting, as the domains of 

attraction are commonly discussed in association with such notions. While such notions 
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are not required in the mathematics of this paper, it is nevertheless comforting to have 
them at least on an intuitive level. Unlike an iid sample of a random variable on the real 
line where the values are numerically ordered and therefore an extreme value is naturally 
dehned, the letters in an alphabet do not assume numerical values nor do they admit 
natural ordering. It is much less clear what a reasonable notion of an extreme value should 
be in such a case. Here if we insist to have a notion of an extreme value associated with 
a sample, then perhaps such a value should be based on its rarity or unusualness with 
respect to the observed values in the sample. The rarest values in the sample are those 
with frequency one and there are most commonly many more than one such observed value 
in a sample. If we entertain a rarer value, it has to be those with frequency zero, i.e., the 
letters in the alphabet that are not represented in the sample, which, though not in the 
sample, are nevertheless associated with and specihed by the sample. If we anticipate that 
another iid observation from say Xn+i-, is to be taken, it would be reasonable then to 
consider the value of Xn+i to be extreme if x^+i takes a letter that is not observed in the 
original sample of size n. To fix the idea, we will subsequently use the term “an extreme 
value” to mean that a new observation Xn+i assumes a value unseen in the sample of size 
n. Similarly we can also entertain what a notation of a tail should be on an alphabet. 
Whenever there is no risk of ambiguity, let us loosely refer to a subset of ^ with low 
probability letters as a “tail” in the subsequent text. In this sense, a subset of with 
very low probability letters may be referred to as a “distant tail”, and a distribution on 
a hnite alphabet has essentially “no tail”. Furthermore we note that, though there is no 
natural ordering among the letters in , there is one on the index set {/c; k > 1}. There 
therefore exists a natural notion of a distribution P = {pk} having a thinner tail than that 
of another distribution Q = {qk}, in the sense of pk < Qk for all k > ko for some integer 
^0^1) when P and Q share a same alphabet and are enumerated by a same index set. In 
such a case, we will subsequently say that P has a thinner tail than Q in the usual sense. 
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Finally we note that the discussion of domains of attraction for continuous random variables 
very much hinges on a well-dehned extreme value, which is lacking on alphabets, and the 
differentiability of its cumulative distribution function, which is completely non-existent 
due to the discrete nature of alphabets. As a result of these characteristics, or the lack 
of them, in the current problem concerning distributions on alphabets, a fundamentally 
different theoretical platform is needed to move forth. 

To move forth on an intuitive note, let us adopt the notation of an out-of-sample extreme 
value as described above. We may then entertain the probability of Xn+i being an extreme 
value, i.e., P(n"^^{X„_|_i ^ Wj}), which is, after a few algebraic steps, 

Ci,n = Efc>iPfc(l -Pfc)”- 

Remark 1. is a member of the family of the generalized Simpson’s indices (u,v discussed 
by Zhang and Zhou (2010) which plays an important role in characterizing the underlying 
distribution {pk} (up to a permutation on the index set) and in giving alternative represen¬ 
tations to Shannon’s entropy and Renyi’s entropy, which are well-known tail indices on an 
alphabet, as discussed in Zhang (2012). 

Clearly „ —)■ 0 as n —>■ oo for any probability distribution {pk} on A multiplica- 
tively adjusted version of is dehned below and will subsequently be referred to as the 
tail index. 

tn = <l,n = Efc>l - PkY- (1) 

Remark 2. Suppose there are two independent iid samples of the same size n. The tail 
index in m may also be interpreted as the average number of observations in one sample 
that are not found in the other sample. 

The fact that t^ is tail-relevant is manifested in the fact that Ci,n is tail-relevant. To see 
that is tail-relevant, let us hrst consider tto = J2k>iPk^yk = 0]. 1 — ttq is often referred 
to as the sample coverage of a population in the literature. Since the letters not represented 
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in a large sample are likely those with low probabilities, it is reasonable to think that ttq is 
a tail-relevant quantity for a large n; and yet Ci,n = E(7J'o)- Intuitively one would expect tiq 
to take a smaller (larger) value under a more (less) concentrated probability distribution, 
and therefore to expect Ci,n, and hence tn, to be a reasonable measure to characterize the 
tail of a distribution on an alphabet. Also to be noted is that, for any given integer ko > 1, 
the first ko terms in the re-expression of tn below converges to zero exponentially fast as 
n —)■ cxD 

in = T.k<ko - PkT + Efc>fco - PkT^ 

and therefore the asymptotic behavior of has essentially nothing to do with how the 
probabilities are distributed over any fixed and hnite subset of , further noting that tn 
is invariant under any permutation on the index set {k}. 

Remark 3. Good (1953) introduced a remarkable estimator of tiq in the form of Ni/n 
where Ni = = !]■ The estimator, also known as Turing’s formula, is the subject 

of much research in the existing literature. Notable papers on this topic include Robbins 
(1968) and Esty (1983), and more recent advances are reported in Zhang and Huang (2008), 
Zhang and Zhang (2009) and Zhang (2013). One of the most intriguing characterisitcs of 
Turing’s formula is its ability to infer nonparametrically the probability beyond the range 
of observed data. 

Remark 4. Domains of attraction for distributions of continuous random variables are a 
long-standing focal point of the extreme value theory. The large volume of research on this 
topic in the existing literature goes back to Frechet (1927) and Fisher and Tippett (1928), 
and includes full analyses by Gnedenko (1944) Smirnov (1949). There the three main 
domains of attraction are defined along the lines of Gumbel family (thick tails), Frechet 
family (thin tails) and Weibull family (no tails). The main objective of this paper is to 
similarly characterize many distributions on alphabets by the indices {tn,n > 1} into three 
domains. Domain 0 (no tails). Domain 1 (thin tails), and Domain 2 (thick tails). 
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Definition 1. A distribution P = {pk} on is said to belong to 

1. Domain 0 i/lim„^oo^n = 0, 

2. Domain 1 i/lim = cp for some constant cp > 0, 

3. Domain 2 i/lim„^oo^n = oo, and 

4- Domain T, or Domain Transient, if it does not belong to Domains 0, 1, or 2. 


The four domains so defined above form a partition of The primary results estab¬ 
lished in this paper include: 


1. Domain 0 does and only does include probability distributions with positive proba¬ 
bilities on a finite subset of . 


2 . 


Domain 1 includes distributions with thin tails such aspk = O (a ,pk = 0 
and Pk = O {D where a > 1, A > 0 and r G (— cxd, cxd) are constants. 



1 


3. Domain 2 includes distributions with thick tails such as pk = O {k and pk = 
O [{k In^ k)~^) where A > 1. 


4. A relative regularity condition between two distributions (one dominates the other) 
is defined. Under this condition, all distributions on a countably infinite alphabet, 
that are dominated by a Domain 1 distribution, must also belong to Domain 1. 


5. Domain T is not empty. 


The secondary results established in this paper include: 


1. In Domain 0, —)■ 0 exponentially fast for every distribution. 

2. The tail index of a distribution with tail pk = O (e“^^) where A > 0 in Domain 
1 perpetually oscillates between two positive constants and does not have a limit as 


n —)■ cx). 
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3. There is a uniform positive lower bound for limsup^^oofn for all distributions with 
positive probabilities on inhnitely many letters of 

All above mentioned results are given in Section 2. Section 3 includes several constructed 
examples, each of which illustrate a point of interest. The paper ends with a brief discussion 
in Section 4 on the statistical implication of the established results. 

2 Main Results. 

Let K be the effective cardinality, or simply the cardinality when there is no ambiguity, of 
he., K = Y,k > 01 - 

Lemma 1. If K = oo, then there exists a suhsequenee k> 1} inN, satisfying —)■ cx) 

as k ^ oo, such that tn^ > c> 0 for all sufficiently large k. 

Proof Let us assume without loss of generality that pk > 0 for all k > 1. Since Ci,n is 
invariant with respect to any permutation on the index set {k] k > 1}, it can be assumed 
without loss of generality that {pk} is non-increasing in k. For every k, let Uk = [l/pkl- 
With Uk so dehned, we have l/{nk + 1) < Pk < I/nk for every k and Mnik^oonk = oo 
though {nfc} may not necessarily be strictly increasing. By construction, the following are 
true about the n^, /c > 1. 

1. {nk'i /c > 1} is an inhnite subset of N. 

2. Every pk is covered by the interval {l/irik + 1), l/n^]. 

3. Every interval {l/ijik + l),l/nk] covers at least one pk and at most hnitely many pkS. 

Let fn{,x) = nx{l — x)"' for x G [0,1]. fn{x) attains its maximum at a; = (n -|- 1)“^ with 
value 

f (—I = -^ (l -^ 

Vn+l/ n+l \ n+l/ Vn+l/ 
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Also we have 

(^) = ^ 

Furthermore since f'(x) < 0 for (n + 1)“^ < x < 1, we have 

In (^) < fn(x) < fn (^) for ^ < X < ^. 

Since /„(l/n) e~^ and /n(l/(n+ 1)) e~^, for any arbitrarily small but hxed e > 0 

there exists a positive such that for any n > A^, /„(l/(n + 1)) > /„(l/n) > e~^ — e. 

Since lim^-^oo= oo and {uk} is non-decreasing, there exists an integer > 0 such 
that Uk > Ag for all k > K^. Consider the sub-sequence [tn^] k > 1}. For any k > K^, 

fofc = J2Zl ^kPiil - PiT'^ > fnZPk)- 

Since pk G (l/(nfc + 1), l/n^] and/„^(x) is decreasing on the interval (l/(nfc + 1), l/rifc], we 
have 

fnM > fn, (^) > e-^-e, 

and hence tn, > fnkiPk) > e~^ — £ for all k > K^. □ 

Theorem 1 . K < oo if and only if 

lim tn = 0. (2) 

n—)-oo 

Proof. Assuming that P = {pk'A < ^ < A} where K is hnite and > 0 for all k, 

I < k < K, and denoting po = minjpfc; 1<A;<A}>0, the necessity of (E]) follows the 

fact that as n —)■ oo 

tn = nYfk Pk{7-pkY < nY,k Pk{7-poT = n(l - po)" -t 0. 

The sufficiency of ([2]) follows the fact that, if A = oo, then Lemma [1] would provide a 
contradiction to ([2]). □ 

In fact the proof of Theorem [1] also establishes the following corollary. 
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Corollary 1. K < oo if and only if tn < Oinq^) where go is a constant in (0,1). 

Theorem [1] and Corollary [1] firmly characterize Domain 0 as a family of distributions on 
hnite alphabets. All distributions outside of Domain 0 must have positive probabilities on 
inhnitely many letters of 3^. The entire class of such distributions is denoted as In 
fact in the subsequent text when there is no ambiguity will denote the entire class of 
distributions with a positive probability on every in . For all distributions in a 
natural group would be those for which lim„ = oo and so Domain 2 is defined. 

The following three lemmas are useful in the proof of Theorem [2] below which puts 
distributions with a power decaying or a slower tail in Domain 2. Lemma [2] is a version of 
the well-known Euler-Maclaurin formula and therefore is referred to as the Euler-Maclaurin 
Lemma subsequently. 

Lemma 2. (Euler-Maclaurin) Let fn{x) be a continuous function of x on [xq, oo) where Xq 
is a positive integer. Suppose fn{,x) is increasing on [xo,x{n)] and decreasing on [x{n), oo). 
If fn{xo) -)■ 0 and fn{x{n)) -)■ 0, then 

lim„^ooEfe>xo/"(^) = f^fn(x)dx. 


Proof. It can be verified that 

T.xo<k<x{n) fn{k) - fn{x{n)) < fn{x)dx < T.xo + l<k<xin) fn{k) + fn{x{n)) and 

T.k>x{n) fn{k) - fn{x{n)) < fn{x)dx < Y.k>x{n) fn{k) + fn{x{n)). 

Adding the corresponding parts of the two expressions above and taking limits give 

lini„^oo Y.V=xo “ 2 lim^^oo fn{x{n)) < lim„^oo /“ fn{x)dx 

< lim^^oo fnik) ~ n—>-oo fn{xo) + 2hm n—>-oo yn(^(^))* 

The desired result follows the conditions of the lemma. □ 


The next lemma includes two trivial but useful facts. 
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Lemma 3. 1. For any real number a; G [0,1), 1 — a; > exp (—. 

2. For any real number a; G (0,1/2), < 1 + 2x. 

Proof. For part 1, the function y = is strictly increasing over [0, cxd), and has value 1 
at t = 0. Therefore > 1 for t G [0, cxd). The desired inequality follows the change of 
variable x = t/{l + t). For part 2, the proof is trivial. □ 

Lemma 4. For any given probability distribution P = {pk] k > 1}, YlkPki)- — PkY 
c > 0 for some eonstants c > 0 and S G (0,1), if and only if YlikPk^~^'^’" —)■ c > 0, as 
n —)■ oo. 


Proof Let S* = 6/8. Consider the partition of the index set {k] k > 1} = I U II where 
/ = {/c;pfc < and II = {k;pk > 


Since pe~"'^ has a negative derivative with respect to p on interval (1/n, 1] and hence on 
for large n, pke~"'^^ attains its maximum at pk = for every k G II. 

Therefore noting that there are at most n^~^* indices in //, 


0<ni E//Pfc(l-Pfc)''< 


E// (( -r^e 


Thus 


and 


= 0 . 


lim^^oo^^ ^Y.kPk{i-- PkT = fann^oon^ E/Pfc(l-Pfc) 


hm„^oon^ ^Y.kPk<^ = lim„^oo ^Y^iPk^ 
On the other hand, since 1 — p < e~^ for all p G [0,1], 


(3) 

(4) 


^Y.iPk{^-PkT<n^ ^T.jPke 
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Furthermore, applying 1) and 2) of Lemma [3] in the hrst and the third steps below 
respectively leads to 

-Pfc)” > n^-^E/Pfcexp (-1^) 

> n^-^J2iPkexp (- iEupEJ > rii-'5E/exp(-2n(sup^Pfc)^)Pfce-”^h 
Noting the fact that hm„_ 5 .oo exp(—2n(supjPfc)^) = 1 uniformly by the dehnition of I, 

and hence, by ([3]) and (jl]), the lemma follows. □ 

Theorem 2. For any given probability distribution P = {pk]k > 1}, if there exists con¬ 
stants A > 1, c > 0 and integer ko > 1 such that for all k > ko 

Pk > ck~^, (5) 

then limn^oo^n = oo. 

Proof For clarity, the proof is given in 2 cases respectively: 

1. Pk = ck~^ for all k > kQ for some /cq > 1, and 

2. Pk > ck~^ for all k > ko for some fco > 1- 

Case 1: Assuming pk = ck~^ for all k > ko, it suffices to consider the partial series 

J2k>ko'^Pki^ - PkT- First consider 

EZkoPke--^^ = n^--^ EZko = Y^Zk, UZ 

where fn{,x) = ^. Since it is easily verihed that 

f'n{x) = — 

it can be seen that, fn{x) increases over [1, (nc)^/^] and decreases over [(nc)^/'^, cxo). Let 
xo = ko and x{n) = (nc)F^. It is clear that fn{xo) —?■ 0 and 

fn(x(n)) = ^ = n^~^c(nc)~^e~^ = —^ —)■ 0 . 

g^l/A 
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Invoking the Euler-Maclaurin Lemma, we have, with changes of variable t = x ^ and then 


s = net, 

1 _ 1 . 

^Y.k=k,Pke 


-npk 


L 


~ n ^cx 

XQ 


\e ncx _ c j^o xe 


= A (net) A(nc) ^'^^d(nct) = fn^ a (nc) ^■'"a s ac ^ds 

c'X 0 PTiCXfs — 2l —c 7 PTLCXf) (\ —i—1 —c 7 

= Jo ^ ^ V Jo ^ 


= fr(i-i) 


1 r”“o s('--i'l-^e-‘da 
r(i-ii Jo * e ub 


^ f r (1 - i) > 0. 


Hence by Lemma 01 n^ ~ Pk)"' —t ^L (1 — 1/A) > 0 and therefore 


tr, —>■ oo. 


Case 2: Assuming Pk> ck ^ ='■ Qk for all k > for some fco ^ Ij we hrst have 

Since fn(x) = n^~^ck~^e~'^^^ ^l[k > (nc)^] satisfies the condition of the Euler-Maclaurin 
Lemma with x(n) = (nc)^ and fn(x(n)) —)■ 0, we again have 

77,^“^ Z) r/ N A ck~^e~"'^’^ ^ = c ff° n^~^x~^e~^‘^^ ^Ib > [{n + l)c]^dx 

^fc>[(n+l)c]A L _ LV I ; J J 

= c r°° 1 ^dx = c^A“^r (l — A) r("-+i)c _i^g(i-x)-ig-s^g 

J[(n+l)c]V V \/ JO r(l-^) ^ ^ 

^ciA-ir(l- A) > 0. 

On the other hand, for sufficiently large n, I* = {k;pk < b {k; k > /cq}, by parts 1) 
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and 2) of Lemma [3] at steps 2 and 4 below and ([6]) at step 7, we have 

Ekei* ^ Efce/* ?fc(l - gfc)'' 

> Efce/* Qk exp (^- 

> Efce/* exp(-2n(snp^* gfc)^)gfce“’"'?'= 

> Efce/* exp(-2/n)gfce“”'?'' 

= exp(—2/n)n^“^/'^ Efce/* ^ 

cU-^r (1 - x) > 0. 

Finally = rz X]fcPfc(l - Pfc)" > Efce/* Pa:( 1 - Pfc)" oo as n cx). □ 

Theorem [2] pnts distribntions with power decaying tails, for example pk = c\k~^, and 
those with slower decaying tails, for example pk = c\{k\n^ where A > 1 and ca > 0 is 
a constant which may depend on A, in Domain 2. 

In view of Lemma [H and Theorems [H and El Domain 1 has a more intnitive dehnition 
as given in the following lemma, the proof of which is trivial. 

Lemma 5. A distribution P on ^ belongs to Domain 1 if and only if 1) the effective 
cardinality of is K = oo, and 2) tn < up for all n and some constant up > 0 which 
may depend on P. 

Lemma 6. For any P = {pk} € if there exists an integer ko > 1 such that pk = 
for all k > ko where Cq> D is a constant, then 

1. tn < u for some upper bound u > 0; and 

2. hm„^oo tn does not exist. 

Proof. Noting that the hrst hnite terms of tn vanishes exponentially fast for any distribntion, 
we may assnme, withont loss of generality, that ko = 1. For any given n, dehne k* = k*{n) 
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by 


Pk*+i < <Pk*- 


(7) 


Noting that function fn{p) 
we have for any n 


= np{l—pY increases for p G (O, decreases forp G l), 
fniPk) < fn{Pk*), k < k* 


( 8 ) 


fniPk) < fniPk*), k>k* + l. 

Since k* = k*{n) depends on n, we may express pk* as, and dehne c(n) by, 


cin) 

Pk^ = 


(9) 


There are two main consequences of the expression in ([9]). The hrst is that tn dehned in 
([1]) may be expressed by ffTOl) below; and the second is that the sequence c{n) perpetually 
oscillates between 1 and e. 

First, for each n, let us re-write each pk in terms of pk*-, and therefore in terms of n and 


c{n). 


Pk*+i = e and Pk*-j = 


for all appropriate positive integers i and j. Therefore 


fn{Pk*-j) = (l - = c{n)D (l 


c{n)e^ 


and 


'^kKk* — ! fniPk) T fniPk*) T 




fn{Pk 


= cin) E cUl- 


c{n)e^ 


-b cin) (1 - 


c(n) 


+ cin) E,=i e-M 1 - 


cpq 

ne^ 


( 10 ) 


Next we want to show that cin) oscillates perpetually over the interval in/in -b l),e) 
which approaches [1, e) as n increases indehnitely. This is so because, since k* is dehned 
by (C]), we have 

£Mg-i < D— < £M 

n — n+l — n 
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or 


e ' < < c(n) <;^e< 


( 11 ) 


Furthermore by definition, k* = k*(n) is an integer-valued increasing step function with 
unit increments. Let A; > 1} be the subsequence of N where is the positive integer 
value n at which k* = k*{n) jumps to a k from A; — 1. Since 

„ p-(fc*+i) ^ 1 < ^ p-fc* 

CqC < —TT S CqC 




< 


co(n-l-l) — 


< e 


-k* 


— {k* -I- 1) < — ln(co(n -|- 1)) < —k* 


k* + 1 > ln(co(n -|- 1)) > A:*, 


we may write k* = [ln(co(n -|- 1))J for each n. Clearly for each sufficiently large value k* 
there are multiple corresponding values of n sharing the same value of A;*, denoted in the 
set {nfc*, rifc* -|- 1, • • • , nk*+i — 1}, and the size of the set increases indefinitely as n ^ oo. 
Regarding the subsequence {rik*} of N, we have l/rik* > Pk* > f/{nk* + 1) or 


1 — Pk* < nk*Pk* < 1, 


( 12 ) 


which implies that, for all sufficiently large n, 


cijik*) = nk*Pk* e (1 - e, 1) 


where £ > 0 is an arbitrarily small real value. 

Similarly regarding the subsequence {nk*+i — 1} of N, we first have 

Pk* = Pk*+ie = 5^PA:*+ie = ink*+iPk*+i)e 

and therefore by flT^ 


(13) 


c(nt.+, - 1) = ( "‘‘y,* ) ("f+iPi!"+i)e -1 e 
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which implies that, for all sufficiently large n, 


c(ufc*+i -1) > e-e 


(14) 


where £ > 0 is an arbitrarily small real value. Furthermore over the set {n/f.*,nk* + 
1, • • • ,nk*+i — 1}, by the definition of c(n) it is easy to see that c(n) strictly increases 
with an exact increment of pk* which decreases to zero as n increases indefinitely. At this 
point, it has been established that the range of c{n) for n > ng, where ng is any positive 
integer, covers the entire interval [l,e). 

Noting N = U{nfe*,nfc* + 1, • • • ,nfc*+i — 1} where the union is over all possible integer 
values of k *, m and ffT4)) jointly establish that the function c{n) oscillates perpetually 
over the entire range of [1, e). 

The first part of the lemma follows that, noting that e~^ < c{n) < e (see fllip l and that 


I — p < e P for all p e [0,1], 



n 





For the second part of the lemma, consider, for any fixed c > 0, 



By Dominated Convergence Theorem, 


t{c) := lim 



and t{c) is a non-constant function in c on [1, e]. 
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The argument thus far implies that, as n increases, c(n) repeatedly visits any arbitrarily 
small closed interval [a, h] C [1, e] inhnitely often, and therefore there exists for each such 
interval a subsequence {up / > 1 } of N such that c{ni) converges, he., c{ni) —)■ 6 for some 
9 G [a,h]. Since t{c) is a non-constant function on [l,e], there exist two non-overlapping 
closed intervals, [ai,/?i] and [ 02 ,/^ 2 ] in [l^e], satisfying 

maxa^<c<bi f(c) < min„ 2 <c< 62 ^(c), 


such that there exist two sub-sequences of N, said {up / > 1 } and {rim] nr > 1 }, such that 
c{ni) —)■ 61 for some 61 G [oi, 61 ] and c{nm) —t 82 for some 62 G [ 02 , 82 ]. 

Consider the limit of along {up / > 1}, again by Dominated Convergence Theorem, 


hm^j_^oo tni hm,^j_^oo 


c{ni) Y!f=o^ e-?’ 




+ c{n,)Y.7=,e-ni- 


cjni) 

ne^ 


= 9i + 9i E” i = «(9,). 

A similar argument gives lim„^^oo but t{ 6 i) 7 ^ t( 6 * 2 ) by construction, and hence 

lim^^oo^n does not exist. □ 

A similar proof to that of Lemma [ 6 ] immediately gives Theorem [3] below with a slightly 
more general statement. 


Theorem 3. For any given probability distribution P = {p^] k > 1}, if there exists eon- 
stants a > 1 and integer such that for all k > ko 


Pk = ca 


( 15 ) 


then 

1. tn < Ua for some upper bound Ua> 0 which may depend on a; and 
n^oo tn does not exist. 


2 . lim, 
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Theorem [3] puts distributions with tails of geometric progression, for example pk = 
A > 0 and ca > 0 are constants or p^ = in Domain 1. 

Next we develop a notion of relative dominance of one probability distribution over 
another on a countable alphabet within Let denote the cardinality of a set A. 

Definition 2. Let Q* G and P G be two distributions on , and let Q = {g^} be 
a non-increasingly ordered version of Q*. Q* is said to dominate P if 

ff{r,Pi G {qk+i,qk\A > 1} < M < oo 

for every k > 1, where M is a finite positive integer. 

It is easy to see that the notion of dominance by Dehnition [2] is a tail property, and 
that it is transitive, he., if Pi dominates P 2 and P 2 dominates P 3 then Pi dominates P 3 . 
It says in essence that, if P is dominated by Q, then the PiS do not get overly congregated 
locally into some intervals dehned by the g^s. 

The following examples shed a bit of intuitive light on the notion of dominance by 
Dehnition |2J 

Example 1. Let pk = Cie~^^ and qk = C 2 e~^ for all k > for some integer ko > 1 and 

■ 2 

other two constants ci > 0 and C 2 > 0. For every sufficiently large k, suppose pj = cie~^ < 
qk = C 2 e~^, then —< In (C 2 /C 1 ) — k and j + 1 > [k + In (ci/c 2 )]^'^^ + 1. P follows that 

^ < g^g-(\A+W« 7 ^+l) ^ gig-(A:+ln(ci/c2)+l)-2y'fc+ln(ci/c2) 

= ^ g2g-(fc+l)g-2\/fc+ln(ci/c2) g2g-(fc+l) = 

This means that if pj G (gfc+i,gfc] then necessarily pj^i ^ (gA;+i,gA:], which implies that 
each interval {qk+i,qk] can contain only one pj at most for a sufficiently large k, i.e., 
k > fcoo •= max{fco, ln(c 2 /ci)}. Since there are only finite pjS covered by l->i<k<kooiQk, qk+i], 
Q = {^k} dominates P = {pi}. 
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Example 2. Letpk = cia~^ and qk = C 2 b~^ for all k > ko for some integer k^ > 1 and other 

two constants a > b > 1. For every sufficiently large k, suppose pj = Cia~^ < 

then —j In a < In (c 2 /ci) — klnb and j + I > k{lnb/ In a) + 1 + In (ci/c 2 ) / In a. It follows 


that 


-(k^T^+i+fApAA) -(kiog.b+iD-^ApAzi) 

= c-,a V 7 = ClO V ^ J 


Pj+i = Cia 


= Cib < cib *°Sa(ci/c 2 ) _ (^+i) = qk+i. 

By a similar argument as that in ExampleU], Q = {qk} dominates P = {pi}- 


Example 3. Let pk = cik ’’e for some integer ko > 1 and constants A > 0 and r > 0, 
and qk = C 2 e~^^ for all k > ko. Suppose for a k > ko there is a j such that pj = cij~^e~^^ G 
(gfc+i = gfc = C 2 e"^''], then 

Pj+i = Ci{j + = Ci(j + 

< C 2 e-^^e-^ = qk+u 

which implies that there is at most one pj in {qk+i, qk] for every sufficiently large k. There¬ 
fore Q = {qk} dominates P = {pi}. 


Example 4. Letpk = ciDe~^^ for some integer ko > 1 and constants A > 0 and r > 0, and 
qk = C 2 e~^^l’^'^^ for all k > ko- Suppose for any sufficiently large j, j > jo '■= — l] \ 

we have pj = Cij^e~^^ G {qk+i = qk = 026 “^'^/^^^] for some sufficiently large 

k > ko, then 

Pj+i = Ci{j + = Ci(j + 

< C2e~2^e~^ ~ C2e~^^^~^^^e~2 

< qk+ie-i = qk+i 

which implies that there is at most one pj in {qk+i, qk] for every sufficiently large k. There¬ 
fore Q = {qk} dominates P = {pi}. 


Example 5. Let pk = qk for all k > 1. Q = {qk} and P = {pk} dominate each other. 
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While in each of Examples [T] through 01 the dominating distribution Q has a thicker 
tail than P in the usual sense, the dominance of Dehnition 0] in general is not implied 
by such a thinner/thicker tail relationship. This is so because a distribution P G ^+, 
satisfying pk < qk for all sufficiently large k, could exist yet congregate irregularly to have 
an unbounded supfc>i G {qk+i,qk],i > !}• One such example is given in Section 3 

below. In this regard, the dominance of Definition [2] is more appropriately considered as 
a regularity condition. However it may be interesting to note that the said regularity is a 
relative one in the sense that the behavior of P is regulated by a reference distribution Q. 
This relative regularity gives an umbrella structure in Domain 1 as demonstrated by the 
theorem below. 

Theorem 4. If two distributions P and Q in on a same countably infinite alphabet 
are such that Q is in Domain 1 and P is dominated by Q, then P belongs to Domain 

1 . 

Proof Without loss of generality, it may be assumed that Q is non-increasingly ordered. For 
every n, there exists a kn such that G {qk„+i, Noting that the function np{l —pY 
increases in p over (0, l/(n -f- 1 )], attains its maximum value of [1 — l/(n -|- < e~^ at 

p = l/(n -|- 1 ), and decreases over [l/(n -|- 1 ), 1 ], consider 
in{,P) = Y.k>i'nPk{l - PkT 

= npk{l - PkT + + npk{l - PkT + ^(1 " PkT 

< M Ek>k^+1 n^k{l - qkT + e-1 + MJ2i<k<k^ nqk{l - qkY 

= M Efc>i nqk{l - qkY + 

< MtniQ) + Me~^ < oo. 

The desired result immediately follows. □ 

Corollary 2 . Any distribution P on a countably infinite alphabet ^ satisfying pk = ae~^^, 
Pk = be~^^^, or Pk = cDe~^^ for all k > fco, where /cq ^ 1; -^ > 0, r G (—oo, +oo), a > 0, 
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b > 0 and c > 0 are constants, is in Domain 1. 

Proof. The result is immediate following Theorem 0] and Examples [T] through |U □ 

3 Constructed Examples. 

The hrst constructed example shows that the notion of thinner tail, in the sense of pk < Qk 
for k > ko where fco ^ 1 is some hxed integer and P = {pk} and Q = {qk} are two 
distributions, does not imply a dominance of Q over P. 

Example 6 . Consider any strictly decreasing distribution Q = {qk] A; > 1} G and the 
following grouping of the index set {k] k>l}. 

Gi = {!}, G 2 = {2, 3}, • • • , Gm = {m(m - l)/2 + 1 , • • • , m(m - l)/2 + m}, • • • . 

{Gm]'m > 1 } is a partition of the index set {k]k > 1 } and each group Gm contains m 
consecutive indices. A new distribution P = {pk} is constructed according to the following 
steps: 

1. For each m>2, let pk = qm{m-i)/ 2 +m for all k eGm- 

2. pi = I- J2k>2Pk- 

In the first step, m{m — l)/2 + m = m{m + l)/2 is the largest index in Gm and therefore 
Qm{m+i )/2 is the smallest qk with index k G Gm- Since 

0 < Y2k>2Pk ~ X]m >2 ^ 'Ylik>2^k ^ 1 ; 

Pi SO assigned is a probability. The distribution P = {pk} satisfies pk < qk for every 
k > 2 = ko. However the number of terms of pi in the interval {qm{m+i)/ 2 +i,qm{m+i)/ 2 ] is 
at least m and it increases indefinitely as m ^ oo; and hence Q does not dominate P. 

The second constructed example shows that the notion of the dominance of Q = {qk} 
over P = {pk}, as dehned in DehnitionO does not imply that P has thinner tail than Q, 
in the sense of pk < qk for k > ko where fco > 1 is some hxed integer. 


Domains of Attraction 


21 


Example 7. Consider any strictly decreasing distribution Q = {qk]k > 1} G <^+ and the 
following grouping of the index set {k] k>l}. 

G'i = {1,2},G2 = {3,4},-- - = - . 

> 1 } is a partition of the index set {k-,k > 1 } and each group Gm contains 2 
consecutive indices, the first one odd and the second one even. The construction of a new 
distribution P = {pk} is as follows: for each group Gm with its two indices k = 2m — 1 
and k + 1 = 2m, let pk = Pk+i = (<?fc + <?A:+i)/2. With the new distribution P = {pk} so 
defined, we have p 2 m < ? 2 m and P 2 m-i > ? 2 m-i for all m > 1. Clearly Q dominates P (P 
dominates Q as well), but P does not have a thinner tail in the usual sense. 

At this point, it becomes clear that the notation of dominance of Definition [2] and the 
notation of thinner/thicker tail in the usual sense are two independent notions. 

The next constructed example below shows that there exists a distribution such that he 
associated approaches infinity along one subsequence of n and is bounded above along 
another subsequence of n, hence belonging to Domain T. Domain T is not empty. 

Example 8 . Consider the probability sequence qj = 2~^, for j = 1,2,---, along with a 
dijfusion sequence di = 2*, for z = 1 , 2, • • •. A probability sequence {pk}, for k = 1,2, ■ ■ ■, 
is constructed by the following steps: 

W : (a) Take the first value of di, di = 2^, and assign the first 2di = 2^ = 4 terms of qj, 

qi = 2 ~^,q 2 = 2 “^,g 3 = 2 ~^,q 4 = 2“"^, to the first 4 terms of pk, Pi = 2 ~^,p 2 = 
2-2,P3 = 2-3,P4 = 2 -A 

(b) Take the next unassigned term in qj, gs = 2~^, and diffuse it into di = 2 equal 
terms, 2“® and 2“®. 

i. Starting at q^ in the sequence {qj}, look forwardly (j >5) for terms greater 
or equal to 2~^, if any, continue to assign them to pk. In this case, there is 
only one such term q^ = 2 “® and it is assigned to p^ = 2 “®. 
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ii. Take the di = 2 dijfused terms and assign them to pq = 2 ® and pj = 2 
At this point, the first few terms of the partially assigned sequence {pk} are 

p, = 2 -\p2 = 2-2, p3 = 2-^p4 = 2-^p5 = 2-®,P6 = 2-®,P7 = 2-®. 

2 nd . Take the next value ofdi, d 2 = 2^, and assign the next 2^2 = 2^ = 8 unused terms 
of Qj, Qr = 2“'^, • • • , gi 4 = 2 ~^^, to the next 8 terms of pk, ps = 2“’^, • • • ,pi 5 = 

2-14^ 

(b) Take the next unassigned term in qj, gis = 2“^®, and diffuse it into d 2 = 4: equal 
terms of 2“^^ each. 

i. Starting at gis in the sequence of {qj}, look forwardly (j > lb) for terms 
greater or equal to 2“^’^, if any, continue to assign them to pk- In this case, 
there are 2, such terms qiQ = 2“^® and qn = 2“^’^, and they are assigned to 
PiQ = 2“^® and pn = 2 ~^'^. 

ii. Take the <^2 = 2^ = 4 dijfused terms and assign them topis = 2“^^, • • • ,p 2 i = 
2-^’^. At this point, the first few terms of the partially assigned sequence {pk} 
are 

Pi = 2-^,P2 = 2-2, p3 = 2-3, p4 = 2-^, 

P5 = 2-®,P6 = 2-®,P7 = 2-®, 

PS = 2-fipii = 2-3, • • • ,pi5 = 2-3^pl6 = 2-1®, 

Pn = 2-i^Pi8 = 2-1^ ■■■ ,P 2 i = 2-1^. 

: (a) In general, take the next value of di, say di = 2\ and assign the next 2di = 2*+i 

unused terms of qj, say qj^ = 2 ~^°,--- ,gjg+ 2 i+i_i = 2 -(-^°’'- 2 *o+^-i)^ 

2 di = 2*+i terms of pk, say pk^ = 2-^°, • • • ,Pfco+ 2 *+i-i = 2 -(^o+ 2 *+^-i). 

(b) Take the next unassigned term in qj, gjQ+ 2 i+i = 2 -‘^-^°’'- 2 *+^)^ diffuse it into 
di = 2* equal terms, 2-(-^o+*+2*'i^) 
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i. Starting at ^^(,+ 2^+1 in the sequence of {qj}, look forwardly (j > jo + 
for terms greater or equal to if any, continue to assign them to 

Pk- Denote the last assigned pk as pk^- 
a. Take the di = 2* diffused terms and assign them to Pk^+i = 


In essence, the sequence {pk} is generated based on the sequence {qj} with infinitely 
many selected j’s at each of which qj is diffused into increasingly many equal probability 
terms according a diffusion sequence {di}. The diffused sequence is then re-arranged in a 
non-increasing order. By construction, it is clear that the sequence {pk',k > 1}, satisfies 
the following properties: 


Ai: [pk] is a probability sequence in a non-increasing order. 

A 2 : As k increases, {pk} is a string of segments alternating between two different types: 
1 ) a strictly decreasing segment and 2 ) a segment (a run) of equal probabilities. 

Az'. As k increases, the length of the last run increases and approaches infinity. 

A^: In each run, there are exactly d* + 1 equal terms, di of which are diffused terms and 
1 of which belongs to the original sequence qj. 

A 5 : Between two consecutive runs (with lengths di + 1 and dj+i + l respectively), the strictly 
decreasing segment in the middle has at least 2 di+i = Adi = di-\-3di > dj + cij+i terms. 

A^: For any k, 1/pk is a positive integer. 

Next we want to show that there is a subsequence {ui} G N such that t„. defined with 
[pk] approaches infinity. Toward that end, consider the subsequence [pkpi > 1} of [pk] 
where the index ki is such that pk^ is first term in the run segment. Let [ui] = {l/pk^} 
which by Aq is a subsequence ofN. By A 3 and A^, 


tm = niJ2k>iPki^ - PkT' > nfidi + l)pkfil - PkiT^ = (di + l) (l “ 


—)■ CX). 
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Consider next the subsequence {pki-(di+i)', i > 1} of {pk} where the index ki is such that pk^ 
is first term in the i^^ run segment, and therefore pki-{di+i) is the {di + 1 )*^ term counting 
backwards from Pki-i, into the preceding segment of at least 2di strictly decreasing terms. 
Let {mi} = {l/pk^-(di+i) - 1} (so pki-{di+i) = {mi + which by Aa is a subsequence of 

N. 

tm, = mi Pk{l - PkT' = mi Efc<fc,-K+1) - PkT' + J2k>ki-di Pk{^ - Pk)""' 

■ tmi,2- 

Before proceeding further, let us note several detailed facts. First, the function np{l — pY 
increases in [ 0 , 1 (? 7 , + 1)], attains maximum atp= l/{n + l), and decreases in [l/(? 7 , + l),l]. 
Second, since pk^-(d^+l) = {mi + 1)“^, by Ai each summand in tmi,i is bounded above by 
miPki-{di+i){^ - Pfci-(di+i))""‘ and each summand in trmp is bounded above by miPki-dA^ - 
Pki-di)^L Third, by A^ and A^, for each diffused term of pk' with k' < ki — {di + 1) in a 
run there is a different non-diffused term pk" with k" < ki — {di + 1) such that pk' > Pk" 
and therefore miPkfl — Pk')^^ ^ miPk"{l — Pk")^C and similarly, for each diffused term of 
Pk> with k' > ki — di in a run there is a different non-diffused term pk» with k" > ki — di 
such that Pk' < Pk" and therefore miPk>{l — Pk')'^' < miPk"{l — Pk")^h These facts imply 
that 

tm, = tmul + tm,,2 = mi Efc<fc,-(d,+l) Pk{^ “ PkT^ + mi Y.k>ki-d, Pk{^ “ PkT^ 

< 2 mi qj{l - < oo 

and the last inequality above is due to Corollary\M 

4 A Statistical Implication. 

While the domains of attraction on alphabets have probabilistic merit, the statistical im¬ 
plication is also quite signihcant. Zhang and Zhou (2010) showed that is estimable 
(there exists at least one unbiased estimator of Ci,*;); and established an unbiased estimator 
of Ci,j; for every u < n — 1. Their estimator is 

Pk (1-Pk- 


— 


_ n^+'"[n—(1+1))]! 




Z^k>l 


( 16 ) 





Domains of Attraction 


25 


Therefore there readily exists an unbiased estimator of t^ for every v < n — 1 namely 

4 = (17) 

Zhang and Zhou (2010) also established several useful statistical properties of including 
the asymptotic normality and that t^ is the uniformly minimum variance unbiased estimator 
{umvue) when K < oo. 

The availability of t^ gives much added merit to the discussion of the domains of at¬ 
traction on alphabets as presented in this paper. Specihcally the fact that the asymptotic 
behavior of characterizes the tail probability of the underlying P and the fact that the 
trajectory of up to n = n — 1 is estimable suggest that much could be revealed by a 
sufficiently large sample. 
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